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Abstract 

We use a statistical-mechanical identity closely related to the familiar virial theorem, to derive 
unbiased estimators for spatial distribution functions of classical fluids. In particular, we obtain 
estimators for both the fluid density p(r) in the vicinity of a fixed solute, and for the pair 
correlation g(r) of a homogeneous classical fluid. We illustrate the utility of our estimators 
with numerical examples, which reveal advantages over traditional histogram-based methods of 
computing such distributions. 
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I. INTRODUCTION 



In molecular simulations of fluids, one is often interested in estimating a radial distribution 
function p(r), that is, the average density of particles as a function of distance r away from 
either a tagged reference particle (the pair correlation function of uniform fluids^]), or a 
fixed point in space (useful in the context of nonuniform fluids [2]). Almost invariably, such 
estimates are constructed using histograms, with p(r) obtained by counting particle visits 
to a spherical shell of radius r and thickness Ar centered about the point of interest 
While simple to implement, this approach inevitably introduces systematic error, or bias, as 
the intended estimator of p{r) is in reality an estimator of the particle density averaged over 
the finite thickness Ar of the shell. This bias can of course be made small by reducing Ar, 
but only at the price of increased statistical error: the smaller the bin size Ar, the fewer the 
particles found in a typical shell. 

In this paper we show how to construct unbiased estimators of radial distribution func- 
tions. Upon ensemble averaging these estimators coincide with the desired observables, 
thus eliminating the systematic error that arises in the histogram methods, and altogether 
avoiding the problem of finite bin size. 

Our estimators express a local, surface quantity in terms of a volume average; for a given 
r, contributions to the estimate of p(r) come from particles throughout the bulk of the 
fluid, and not only from those in a thin spherical shell at r. This approach is similar to the 
widely used virial method for computing the pressure p of a fluid |4(: while p can be defined 
operationally as the force per unit area exerted by the fluid on the walls of its container (a 
surface quantity), the virial theorem re-expresses p as a sum of the ideal gas pressure and an 
exact correction given in terms of particle-particle forces throughout the bulk of the fluid. 

In what follows we first introduce the identity that underlies our approach (EqJHJ, then 
we apply this identity to obtain estimators of commonly studied radial distribution functions 
(EqsEl E2 EUl E01 ■ We present numerical evidence illustrating the practical utility of 
our approach in the context of a Lennard- Jones fluid, and we conclude with a discussion of 
potential extensions applicable to more elaborate situations. 
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II. THEORY 



Consider a fluid of N identical, classical particles confined within a cubic simulation box 
of dimensions L x L x L and volume V = L 3 , with periodic boundary conditions. 1 Let 
$(r 1; . . . , rjv) be the potential energy expressed in terms of the locations of the particles. 
This potential includes both particle-particle interactions and nonuniform terms such as the 
interaction of each particle with a fixed solute. The density p(r) in the canonical ensemble 
is given by: [3] 

P^ = Y^ [dr^e~^-^\ ri=r , (1) 
r i=i 



v 



where J v dr^~^ denotes an integration over the system volume V and all coordinates except 
rj, and Z r = J dri ■ ■ ■ dr^e'^ the configurational part of the partition function. Let us now 
choose an arbitrary three-dimensional vector field u(r), and apply the divergence (Gauss's) 
theorem to the product pu: 



da - up = / dr pV • u + / dr u • V p. (2) 

an Jn Jn 

The two terms on the right are integrals over an arbitrary volume Q contained entirely 
within the simulation box; the term on the left is an integral over the surface dQ of this 
volume. EqsQ and El combine to give 



® da - up = / dr pV • u + j3 ( Uj • Fj ) 

Jan Jn \ ign / 



(3) 



where the sum is over all particles inside the domain Q, the notation Uj is shorthand for 
u(i"j), and Fj = — is the net instantaneous force on particle i. The angle brackets denote 
the canonical average, (•••) = Z~ l f dr x ■ ■ ■ dvj^e -13 ^ 

In the examples discussed below, the force Fj on a given fluid particle includes contri- 
butions from interactions with the other (N — 1) fluid particles, as well as a contribution 
from a potential U(r), which represents a fixed solute whose presence is felt by all N fluid 
particles. Since it is often convenient to distinguish between these two contributions, we 
will use the notation fj to denote the force on particle i arising specifically from interactions 



1 As often the case when using periodic boundary conditions, we assume that the interaction potential 
between pairs of particles has a finite range no greater than L/2, so that each particle interacts with at 
most one image of each other particle. 
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with the other fluid particles, and — Vj{7(rj) to denote the force due to the solute; thus, 
Fj = -ViZ7( ri ) + fi. 

For the special choice u(r) = r, Eq|H]above reduces to a result derived by Henderson (Eq.9 
of Ref. 5^ see also Ref.Q), which relates the density of particles on the surface dQ to the 
density within the volume Q. In that case the last term in Eq|3]is very similar to the familiar 
Clausius virial^j]. More generally, however, Eq|3] encompasses a large family of virial-like 
theorems, each associated with a particular choice of u. Such "hypervirial" relations have 
been studied by Hirschfelder p] , though they have been derived and expressed in a different 
way. We now show how, by choosing u(r) to be discontinuous over the surface of a sphere 
of radius R, we obtain useful identities for the fluid density averaged over this surface. 



A. Hard spherical solute 

To begin, let us consider a fluid in the presence of a fixed hard spherical solute. This 
simple solvation model plays an important role in the study of nonuniform fluid theories 

BBS 

. A quantity of particular interest is the contact density, p c (the particle density in 
the immediate vicinity of the hard sphere) , which is closely related to the free energetic cost 
of "growing;" such a Wsphe, solute . the fluid 0. 

We will use a cartesian coordinate system whose origin r = is at the center of the box, 
and we assume that this point also coincides with the center of the hard solute. Let _R so i 
denote the radius of the solute, -R max = L/2 the radius of the largest sphere that fits inside 
the simulation box, and Q the thick spherical region between these two radii. That is, Q 
is defined by _R so i < r < -R max , and the volume of this region is Vh = (47r/3)(i? max — -Rg i). 
With the aim of determining the particle density at a distance R from the center of the 
solute (where i? so i < R < -R max ), let us construct the vector field 



'r 3 -R 3 sol 



u(r;R) 



, r, (R sol <r<R) 



r 3 _ f? 3 

' r, (R < r < i? max ) 



(4) 



3r 2 

where the notation u(r; R) indicates the parametric dependence of u on the radius of interest 
R. This field vanishes at the boundaries of Q (i.e. at R so \ and -R max ), therefore so does the 
term on the left side of Eq|S] The divergence of u inside the region Q is easily calculated to 



be 



V.u = l- i ^tf(r-i2), (5) 



where the delta- function arises from the discontinuity of u at the surface |r| = R. From this 
we get 

/ drpV-u = (N a )-V n p(R), (6) 
where (Nq) = j n dr p(r) is the average number of particles found in the region Q, and 

P( R ) = ^2 J drp(v)5(r-R) (7) 

is the particle density, averaged over the spherical surface r = R. Combining this result 
with Eq|21 and rearranging terms finally gives us: 



p(R) = y 



(8) 



(N a )+p( J^Ui-f, 
\ «en 

where Uj = u(rf, R). Note that we have replaced Fj in Eq£J]by fj above; because the region 
Q (-Rsoi < r < -R max ) was defined to be entirely outside the region of space occupied by the 
solute (r < -R S oi) , the net force felt by any particle that is inside Q includes only contributions 
from the other fluid particles. Eq|H] is an exact result that expresses the surface-averaged 
density p{R) in terms of a bulk average: all the particles inside Q contribute to the right 
side of EqE 

As a consistency check, we note that if our fluid is an ideal gas, then fj is identically 
zero, and EqJH] becomes p(R) = (Nn) /Vn- This is as expected: in the ideal gas limit the 
fluid density is independent of the distance away from the solute, and is therefore equal to 
the average number of particles in any given region of space, divided by the volume of that 
region. Thus for the more general case of a non-ideal fluid, the two terms inside the square 
brackets in Eq|H] can be interpreted as, respectively, an ideal gas contribution to the radial 
distribution function, and a virial-like correction due to interparticle interactions. 

EqlHl is equivalent to the statement that the quantity 



p(T;R) = ^(N n + f3j2^-^) 



(9) 



is an unbiased estimator of the radial distribution function p(R). Here, T = (ri, r2, • • ■ , rjy) 
is a single N-particle configuration of our system, and the quantity on the right side above is 
well defined for any such configuration. Thus, Nq = Nq(T) is the number of fluid particles 



inside f2, for the configuration T. By the term "unbiased estimator", we mean simply 
that the average of p(T; R), over configurations T sampled from an equilibrium (canonical) 
ensemble, is identically equal to the desired density p(R); that is, p(R) = (p(T; R)). This is 
the content of EqJHJ 

In the example that we have just presented, u(r; R) was chosen so as to (a) cause the 
surface term in Eq. (jHJ) to vanish, and (b) "pick out" the density at a particular distance R 
from the origin via a discontinuity at R. We now show how the same strategy can be applied 
when the solute is no longer perfectly hard. This will pave the way to the consideration of 
particle-particle distribution functions in homogeneous fluids. 

B. General spherical solute 

Consider a solute described by a fixed spherically symmetric potential U(r). Recalling 
that Fj = — ViUfri) + ft, Eq. (JHJ) can be rewritten as 



where 




Essentially, the solute contribution to Fj in the second term on the right side of Eq|3J has 
been transferred to the first term on the right. 

Let us now take Q to be the entire region of space enclosed by a sphere of radius 
-Rmax — L/2. By analogy with the hard sphere example, let us search for a vector field 
u(r; R) that: (a) vanishes on the surface of Q, (b) is discontinuous at r = R, and (c) is 
spherically symmetric and oriented along the radial direction, i.e. u(r; R) = u(r; R) r. We 
also want «(0; R) = 0, to avoid a singularity at the origin. Note that these properties do 
not uniquely specify u(r; R). In the following two paragraphs we will construct two different 
fields satisfying these conditions fEqsIT^IandfTTj) and we will present the unbiased estimators 
associated with these fields fEqsJ16landll9|). In the Appendix, we show how this approach 
can be generalized. 

With EqElin mind, let us first search for a field u(r; R) that satisfies 

C-u= 1- a5(r - R), (12) 
6 



where a is a constant whose value is determined by demanding that u vanish at the origin 
and at the boundary of Q (r — R ma . x ). The solution of Eq[T2]is obtained by straightforward 
integration: 

f Q dss 2 e- pu ^ s \ 0<r<R 
u = r— . (13) 

/I dss 2 e~^ u ^. R<r<R max 

V u J* max 

This field clearly satisfies the desired boundary conditions, and by explicit evaluation we 
can confirm that 



t ■ u = 1 - 5(r - R) 



AttR 2 



Q 



U, 



where 



Qi 



47T 



dss 2 e-^ s \ 



(14) 



(15) 



Combining EqsEl and El and carrying out the sort of rearrangement of terms that led to 
EqsIH]and|ni we arrive at the following unbiased estimator for p(R): 



p(T; R) 



-I3U(R) 



Q 



V 



(16) 



This is very similar to EqEl In particular, the two terms inside the square brackets represent 
an ideal gas contribution to p(R), and a virial-like correction. As in the example of the hard- 
sphere solute, p(T; R) takes on different values for different configurations T, but its average 
over an equilibrium ensemble of configurations is equal to p(R). 

Although EqJ16l is correct, the evaluation of the function u(r; R) involves an integral of 
the form J T ds s 2 e~ l3U ^ (see EqIT3*|). In an actual application of this method this integral 
must generally be carried out numerically. We can avoid this inconvenience by replacing 
the integral in Eq^Jby an analytically tractable expression. For instance, if the solute is 



strongly repulsive within a region r < Ri, then we can replace the factor e 
in the integrand, by the unit step function 9(s — i?i). We then get: 



-f3U(s) 



u(r; R) 

m R) 



C 

3r 2 

JU{r) 



R\)6{r - J2 X ) + {R\ - R^)6(r - R) 



-0U(R) 



Jen* 



AnR 2 



appearing 

(17) 
(18) 

(19) 
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where fi* is the spherical shell defined by R\ < r < R mSLX , whose volume is Vq* = 
(47r/3)(i?^ ax — Rf). This field satisfies the desired conditions listed two paragraphs ago, 
but the evaluation of u(r; R) in this case no longer requires numerical integration. 

Note that the replacement of e~ /3C/( - s ^ (in Eq J13|) by 9(s — R) (to get Eqll7|) simply amounts 
to a different choice of u(r;R). The resulting estimator, Eq ll9l is as exactly unbiased as 
the one given by EqlTol In the Appendix we show that, in fact, both these estimators are 
special cases of a more general expression for p(T; R) (Eq]M|). 

C. Pair correlation function 

We now consider how our method can be applied to the estimation of the pair correlation 
function g(r) of a uniform fluid. As above, we imagine N identical, classical particles inside 
a cubic simulation box of volume V = L 3 and periodic boundary conditions, only now we 
assume that there are no solutes present. The total potential energy of a given configuration 
of paricles is given by Yli<j v ( r ij)i where v(r) is the pairwise interaction potential (e.g. 
Lennard- Jones), is the distance between particles i and j, and the sum is over all pairs 
of particles. 

For purpose of presentation, let us choose one of our N particles to act as a reference 
particle, and imagine for the moment that we hold the position of that particle fixed at the 
center of the simulation box. The pair correlation function is given by g(r) = p(r) / pb, where 
p(r) is the density of particles at a distance r from the reference particle, and pb = (N — l) /V 
is the bulk density of the unconstrained particles (see e.g. Ref. pp. 196-197.) We can 
think of the reference particle as a fixed, spherically symmetric solute, which happens to 
be physically identical to the remaining fluid particles. Since EqslTol and EH both provide 
unbiased estimators for the density profile around such a solute, we can apply either of those 
results to the present case, replacing U(r) by v(r) to get an unbiased estimator for g(R). 
For instance, from EqJ19lwe get 



that particle is being treated as a fixed solute), and similarly i[ is the net force acting on 




(20) 



Here the notation ^ indicates that the reference particle is not included in the sum (since 



S 



particle i from all other particles except the reference particle. The u-field that enters Eql2TJI 
is given by Eq J17l with U(r) replaced by v(r). 

The discussion of the previous paragraph was framed around the idea that a single refer- 
ence particle is held fixed, and g(R) is computed with respect to that particle alone. However, 
in an actual implementation of the method, all N particles would be treated equally during 
the simulation. For every configuration r = (ri, • • • , rjy) generated during such a simulation, 
the right side of Eq|2U] would be evaluated N times, with each particle taking its turn as 
the reference particle. 2 Each of these N evaluations represents a single unbiased estimate 
of g(R); by averaging over all N of them, we obtain better statistics than by the use of only 
a single reference particle. Effectively, the same procedure is used when applying the usual 
histogram method: instead of specifying one reference particle and compiling statistics on 
the relative locations of the other N — l particles, the pair correlation function g is estimated 
by keeping track of all N(N — l)/2 pairwise distances in the fluid. 

D. Generalization to NPT ensemble 

So far in our presentation, we have taken the volume V of the simulation box to be con- 
stant, as appropriate for quantities computed in the NVT equilibrium ensemble. However, 
our method easily extends to the NPT ensemble, which can be viewed as a superposition 
of NVT ensembles. Explicitly, in the NPT case we have 

1 - 

p(R) = ((p(T;R,V))) = lim — £ p(T m ; R, V m ). (21) 

m=l 

The double angle brackets signify an average over both a distribution of system volumes V, 
and for each volume a canonical average over microstates T sampled at that fixed volume. 
The quantity p(T; R, V) is any one of the unbiased estimators derived above, with the depen- 
dence on V (or equivalently, L) made notationally explicit. The final term in the equation 
above refers to an average over M configurations generated during an NPT simulation, in 
which both the microstate T and the volume V fluctuate with time. 

2 The vector then refers to the displacement of the «'th particle relative to the current reference particle, 
i.e. we temporarily treat the location of the reference particle as the origin of the coordinate system. 
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distance from surface of hard solute 



FIG. 1: Estimates of the fluid density obtained from the histogram method (circles) and with 
the unbiased estimator of EqEO (lines) . For the histogram method, we used a bin size Ar = 0.01, 
although only every tenth bin is shown in the figure, to avoid cluttering. When using our unbiased 
estimator, p(r) was evaluated at increments of 0.01, resulting in the smooth curves shown above. 
Finally, the curves corresponding to _R so i = 2.0 and _R so i = 1.0 have been shifted upward by 0.2 
and 0.4 units, respectively, again to avoid cluttering. 



III. NUMERICAL RESULTS 



To test our method, we have performed Monte Carlo simulations of a truncated and 
shifted Lennard- Jones (L J) fluid both in the NVT and NPT ensembles Q| . We have checked 
;he reliability of our simulations with the available literature data on the L J equation of state 
lol | , always finding good agreement within statistical error bars estimated through the block 
average method of Flyvjerg and Petersen l[ 11|. 



As a proof-of-principle test case, we first ran three simulations of a Lennard- Jones fluid 
surrounding a fixed hard spherical solute, in the NPT ensemble, with number of particles 
N = 864, temperature T = 0.85, and pressure p = 0.022. We used a Lennard- Jones cut-off 
radius r c = 2.5 for the fluid particles, and the solute radius was taken as _R so i = 1.0, 2.0, 
and 3.0 for the three simulations. (Here and below we use the reduced LJ units [j| when 
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FIG. 2: Estimates of contact density at the surface of a hard cavity of radius i? so i = 1.0, obtained 
from four separate simulations. The lower points (squares) were obtained with the histogram 
method. The upper points (circles) were obtained using Eq|5J (The connecting lines are guides to 
the eyes.) Each simulation is identified by the number of samples generated, and by the inverse 
bin size used in the estimates based on the histogram method. 

specifying these parameters.) These values are identical to those used to produce Fig. 3 of 
Huang and Chandler Q]. From the data generated during each simulation (2 13 statistically 
independent configurations), we estimated the radial distribution function p(r) using the 
histogram method, as well as with the unbiased estimator given by EqE] above (with u 
given by Eq0J). FigQ shows good agreement between the two. 

Next, to highlight the bias due to finite bin size when using the histogram method - and 
to illustrate that our method does not suffer from this bias - we ran four new simulations of 
a solvated sphere of radius i? so i = 1.0, with the NPT parameters described in the previous 
paragraph. The four simulations differed in duration, generating n s = 2 15 , 2 16 , 2 17 , and 2 18 
samples, respectively. From each simulation we estimated the contact density p c = p(R£ {) 
using both the histogram method (i.e. by counting the number of particles within a distance 
Ar from the surface of the sphere) and the unbiased estimator of EqJHl With the histogram 
method, we varied the bin size Ar from one simulation to the next, keeping the product 
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r 



FIG. 3: The particle-particle distribution function g{r) for a uniform Lennard-Jones fluid, com- 
paring the present method (thick red lines) with the usual histogram (thin, noisy blue lines) for 
different numbers of samples. Each curve was evaluated at points spaced by Ar = 0.01, corre- 
sponding to the bin size used in the histogram method. The curves have been shifted vertically 
in order to facilitate visualization. The error bars of the present method (not shown) are always 
smaller than the histogram ones. 

n s Ar constant so that the number of counts in the bin was roughly the same for each 
simulation. Because p(r) increases sharply as one approaches the solute surface (see the 
upper curve in FigJT]), we expect the histogram method to underestimate p c for any finite 
bin size, but this systematic error should diminish as we decrease the size of the bin. FigEl 
shows the estimates of p c that we obtained from our simulations. The results obtained with 
the histogram method (squares) are consistent with our expectations: with decreasing bin 
size, the estimate of p c increases, suggesting an asymptotic approach to the correct value 
as Ar — ► 0. However, this comes at a price: more samples are needed to maintain the 
same level of statistical error. When using Eq|U] (circles), there is no discernable bias; the 
statistical error bars simply become smaller as the number of samples increases. 

Finally, FigJS] shows estimates of the pair correlation function g(r) obtained from the 
simulation of a uniform Lennard-Jones fluid (no solute) in the NVT ensemble, with N = 108, 
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V = N/pb (where the bulk density was set at pb = 0.70), and T = 0.85. Estimates of g(r) 
were obtained after the generation of 2 9 , 2 11 , and finally 2 15 samples, using both Eql2*Uland 
the histogram method. Although exactly the same data were used for each overlying pair 
of curves, our unbiased estimator furnishes a smoother estimate of g(r), with the difference 
between the two methods diminishing with increasing number of samples. 

IV. CONCLUSIONS 

We have introduced and illustrated a general method for the numerical estimation of 
spatial distribution functions in classical fluids. Our method relies on the construction of 
unbiased estimators (see EqsEJ EH EHl 1201 12H), derived from a virial-like identity (EqCH). 
We have focused in this paper on cases for which the spatial distribution function of interest 
has spherical symmetry, reflecting e.g. the symmetry of the fixed solute, but the extension 
of our method to cases of oddly shaped solutes, or nonuniform averages over possibly non- 
spherical surfaces dfl, is not as difficult as might first appear to be the case 3 , and we hope 
to study this issue in greater detail in future work. Another natural extension would be in 
the direction of complex fluids. 

While the use of simple histograms does not normally pose a serious computational 
challenge to the estimation of spatial distribution functions (at least for simple fluids), the 
numerical results of Section ITTT1 suggest that our method has certain advantages, such as the 
elimination of systematic bias and the generation of smooth results. 
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Appendix 



Here we briefly outline a generalization of the approach that led to EqsEH and EH The 
physical context is that of a fluid surrounding a spherically symmetric solute, but as illus- 
trated in the main text any results obtained in this case can just as well be applied to the 
estimation of the pair correlation function g(r). 

Recall that we want to find a u field that: vanishes at r = -R max ; is discontinuous at 
r = R; and has the form u(r; R) = u(r; R)i, with u(0; R) = 0. These conditions are satisfied 
quite generally by the following field: 

'a(r) 

where <p(r) is an arbitrary function, and #(■) is the unit step function. Because of the way 
that a(r) is defined, u(r; R) points outward (i.e. away from the origin) for r < R, and inward 
for r > R, as was the case with the fields defined by Eqs0J EH and El From Eq|221 we get 



J3U(r) rr 

u(r;R)=r^^ ds s 2 , a(r) = R max ■ 9(r - R), (22) 

r 2 Ja(r) 



C ■ u = e^M 



r) _ 5(r-R) i 
AtcR 2 



(23) 



where = Air J max ds s 2 e~^ <yS \ and C is defined by Eqllll Combining EqsJIOl and |2*B1 and 
once again rearranging terms in the manner that led to Eqs|H] and we get the following 
unbiased estimator for p(R): 



e -f3U{R) 

p(T-R) = —— 



(24) 



where (U - 4>)i = Ufa) - 0(r^. 

The function <ft(r) plays the role of a free parameter, to be chosen so as to maximize 
computational convenience. If we choose 0(r) = U(r), then we arrive at the estimator 
defined by Eqsll3l and ITol Alternatively, we can take <f>(r) to be a hard spherical potential 
(0 = +oo for r < Ri and <fi = for r > Ri) in which case we are led to the results given by 
EqsITUandlli 
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